53 research outputs found

    Computational Statistics and Data Visualization

    Get PDF
    This book is the third volume of the Handbook of Computational Statistics and covers the field of Data Visualization. In line with the companion volumes, it contains a collection of chapters by experts in the field to present readers with an up-to-date and comprehensive overview of the state of the art. Data Visualization is an active area of application and research and this is a good time to gather together a summary of current knowledge. Graphic displays are often very effective at communicating information. They are also very often not effective at communicating information. Two important reasons for this state of affairs are that graphics can be produced with a few clicks of the mouse without any thought, and that the design of graphics is not taken seriously in many scientific textbooks. Some people seem to think that preparing good graphics is just a matter of common sense (in which case their common sense cannot be in good shape) and others believe that preparing graphics is a low-level task, not appropriate for scientific attention. This volume of the Handbook of Computational Statistics takes graphics for Data Visualization seriously.Data Visualization, Exploratory Graphics.

    Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The hierarchical clustering tree (HCT) with a dendrogram <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and the singular value decomposition (SVD) with a dimension-reduced representative map <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> are popular methods for two-way sorting the gene-by-array matrix map employed in gene expression profiling. While HCT dendrograms tend to optimize local coherent clustering patterns, SVD leading eigenvectors usually identify better global grouping and transitional structures.</p> <p>Results</p> <p>This study proposes a flipping mechanism for a conventional agglomerative HCT using a rank-two ellipse (R2E, an improved SVD algorithm for sorting purpose) seriation by Chen <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> as an external reference. While HCTs always produce permutations with good local behaviour, the rank-two ellipse seriation gives the best global grouping patterns and smooth transitional trends. The resulting algorithm automatically integrates the desirable properties of each method so that users have access to a clustering and visualization environment for gene expression profiles that preserves coherent local clusters and identifies global grouping trends.</p> <p>Conclusion</p> <p>We demonstrate, through four examples, that the proposed method not only possesses better numerical and statistical properties, it also provides more meaningful biomedical insights than other sorting algorithms. We suggest that sorted proximity matrices for genes and arrays, in addition to the gene-by-array expression matrix, can greatly aid in the search for comprehensive understanding of gene expression structures. Software for the proposed methods can be obtained at <url>http://gap.stat.sinica.edu.tw/Software/GAP</url>.</p

    A method for analyzing censored survival phenotype with gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Survival time is an important clinical trait for many disease studies. Previous works have shown certain relationship between patients' gene expression profiles and survival time. However, due to the censoring effects of survival time and the high dimensionality of gene expression data, effective and unbiased selection of a gene expression signature to predict survival probabilities requires further study.</p> <p>Method</p> <p>We propose a method for an integrated study of survival time and gene expression. This method can be summarized as a two-step procedure: in the first step, a moderate number of genes are pre-selected using correlation or liquid association (LA). Imputation and transformation methods are employed for the correlation/LA calculation. In the second step, the dimension of the predictors is further reduced using the modified sliced inverse regression for censored data (censorSIR).</p> <p>Results</p> <p>The new method is tested via both simulated and real data. For the real data application, we employed a set of 295 breast cancer patients and found a linear combination of 22 gene expression profiles that are significantly correlated with patients' survival rate.</p> <p>Conclusion</p> <p>By an appropriate combination of feature selection and dimension reduction, we find a method of identifying gene expression signatures which is effective for survival prediction.</p

    Design of microarray probes for virus identification and detection of emerging viruses at the genus level

    Get PDF
    BACKGROUND: Most virus detection methods are geared towards the detection of specific single viruses or just a few known targets, and lack the capability to uncover the novel viruses that cause emerging viral infections. To address this issue, we developed a computational method that identifies the conserved viral sequences at the genus level for all viral genomes available in GenBank, and established a virus probe library. The virus probes are used not only to identify known viruses but also for discerning the genera of emerging or uncharacterized ones. RESULTS: Using the microarray approach, the identity of the virus in a test sample is determined by the signals of both genus and species-specific probes. The genera of emerging and uncharacterized viruses are determined based on hybridization of the viral sequences to the conserved probes for the existing viral genera. A detection and classification procedure to determine the identity of a virus directly from detection signals results in the rapid identification of the virus. CONCLUSION: We have demonstrated the validity and feasibility of the above strategy with a small number of viral samples. The probe design algorithm can be applied to any publicly available viral sequence database. The strategy of using separate genus and species probe sets enables the use of a straightforward virus identity calculation directly based on the hybridization signals. Our virus identification strategy has great potential in the diagnosis of viral infections. The virus genus and specific probe database and the associated summary tables are available a

    Morus alba and active compound oxyresveratrol exert anti-inflammatory activity via inhibition of leukocyte migration involving MEK/ERK signaling

    Get PDF
    Background: Morus alba has long been used in traditional Chinese medicine to treat inflammatory diseases;however, the scientific basis for such usage and the mechanism of action are not well understood. This studyinvestigated the action of M. alba on leukocyte migration, one key step in inflammation.Methods: Gas chromatography-mass spectrometry (GC-MS) and cluster analyses of supercritical CO2 extractsof three Morus species were performed for chemotaxonomy-aided plant authentication. Phytochemistry andCXCR4-mediated chemotaxis assays were used to characterize the chemical and biological properties of M. albaand its active compound, oxyresveratrol. fluorescence-activated cell sorting (FACS) and Western blot analyses wereconducted to determine the mode of action of oxyresveratrol.Results: Chemotaxonomy was used to help authenticate M. alba. Chemotaxis-based isolation identifiedoxyresveratrol as an active component in M. alba. Phytochemical and chemotaxis assays showed that the crudeextract, ethyl acetate fraction and oxyresveratrol from M. alba suppressed cell migration of Jurkat T cells in responseto SDF-1. Mechanistic study indicated that oxyresveratrol diminished CXCR4-mediated T-cell migration via inhibitionof the MEK/ERK signaling cascade.Conclusions: A combination of GC-MS and cluster analysis techniques are applicable for authentication of theMorus species. Anti-inflammatory benefits of M. alba and its active compound, oxyresveratrol, may involve theinhibition of CXCR-4-mediated chemotaxis and MEK/ERK pathway in T and other immune cells

    Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Get PDF
    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as Ī²-defensin 4 (DEFB4) and its paralog HSPDP3

    Genomics and proteomics of immune modulatory effects of a butanol fraction of echinacea purpurea in human dendritic cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Echinacea </it>spp. extracts and the derived phytocompounds have been shown to induce specific immune cell activities and are popularly used as food supplements or nutraceuticals for immuno-modulatory functions. Dendritic cells (DCs), the most potent antigen presenting cells, play an important role in both innate and adaptive immunities. In this study, we investigated the specific and differential gene expression in human immature DCs (iDCs) in response to treatment with a butanol fraction containing defined bioactive phytocompounds extracted from stems and leaves of <it>Echinacea purpurea</it>, that we denoted [BF/S+L/Ep].</p> <p>Results</p> <p>Affymetrix DNA microarray results showed significant up regulation of specific genes for cytokines (IL-8, IL-1Ī², and IL-18) and chemokines (CXCL 2, CCL 5, and CCL 2) within 4 h after [BF/S+L/Ep] treatment of iDCs. Bioinformatics analysis of genes expressed in [BF/S+L/Ep]-treated DCs revealed a key-signaling network involving a number of immune-modulatory molecules leading to the activation of a downstream molecule, adenylate cyclase 8. Proteomic analysis showed increased expression of antioxidant and cytoskeletal proteins after treatment with [BF/S+L/Ep] and cichoric acid.</p> <p>Conclusion</p> <p>This study provides information on candidate target molecules and molecular signaling mechanisms for future systematic research into the immune-modulatory activities of an important traditional medicinal herb and its derived phytocompounds.</p

    Designating eukaryotic orthology via processed transcription units

    Get PDF
    Orthology is a widely used concept in comparative and evolutionary genomics. In addition to prokaryotic orthology, delineating eukaryotic orthology has provided insight into the evolution of higher organisms. Indeed, many eukaryotic ortholog databases have been established for this purpose. However, unlike prokaryotes, alternative splicing (AS) has hampered eukaryotic orthology assignments. Therefore, existing databases likely contain ambiguous eukaryotic ortholog relationships and possibly misclassify alternatively spliced protein isoforms as in-paralogs, which are duplicated genes that arise following speciation. Here, we propose a new approach for designating eukaryotic orthology using processed transcription units, and we present an orthology database prototype using the human and mouse genomes. Currently existing programs cover less than 69% of the human reference sequences when assigning human/mouse orthologs. In contrast, our method encompasses up to 80% of the human reference sequences. Moreover, the ortholog database presented herein is more than 92% consistent with the existing databases. In addition to managing AS, this approach is capable of identifying orthologs of embedded genes and fusion genes using syntenic evidence. In summary, this new approach is sensitive, specific and can generate a more comprehensive and accurate compilation of eukaryotic orthologs

    Molecular signature of clinical severity in recovering patients with severe acute respiratory syndrome coronavirus (SARS-CoV)

    Get PDF
    BACKGROUND: Severe acute respiratory syndrome (SARS), a recent epidemic human disease, is caused by a novel coronavirus (SARS-CoV). First reported in Asia, SARS quickly spread worldwide through international travelling. As of July 2003, the World Health Organization reported a total of 8,437 people afflicted with SARS with a 9.6% mortality rate. Although immunopathological damages may account for the severity of respiratory distress, little is known about how the genome-wide gene expression of the host changes under the attack of SARS-CoV. RESULTS: Based on changes in gene expression of peripheral blood, we identified 52 signature genes that accurately discriminated acute SARS patients from non-SARS controls. While a general suppression of gene expression predominated in SARS-infected blood, several genes including those involved in innate immunity, such as defensins and eosinophil-derived neurotoxin, were upregulated. Instead of employing clustering methods, we ranked the severity of recovering SARS patients by generalized associate plots (GAP) according to the expression profiles of 52 signature genes. Through this method, we discovered a smooth transition pattern of severity from normal controls to acute SARS patients. The rank of SARS severity was significantly correlated with the recovery period (in days) and with the clinical pulmonary infection score. CONCLUSION: The use of the GAP approach has proved useful in analyzing the complexity and continuity of biological systems. The severity rank derived from the global expression profile of significantly regulated genes in patients may be useful for further elucidating the pathophysiology of their disease
    • ā€¦
    corecore